Advanced Geospatial Data Processing for Social Scientists
Dennis Abel & Stefan Jünger
2025-04-29
Day
Time
Title
April 28
10:00-11:15
Introduction
April 28
11:15-11:30
Coffee Break
April 28
11:30-13:00
Raster data in R
April 28
13:00-14:00
Lunch Break
April 28
14:00-15:15
Raster data processing
April 28
15:15-15:30
Coffee Break
April 28
15:30-17:00
Graphical display of raster data in maps
April 29
10:00-11:15
Datacube processing I
April 29
11:15-11:30
Coffee Break
April 29
11:30-13:00
Datacube processing II & API access
April 29
13:00-14:00
Lunch Break
April 29
14:00-15:15
Data integration and linking (with survey data)
April 29
15:15-15:30
Coffee Break
April 29
15:30-17:00
Outlook and open session with own application
Refresher: geospatial data and their implementation in R
Why care about data types and formats?
There are differences in how spatial information is stored, processed, and visually represented.
Different commands for data import and manipulation
Spatial linking techniques and analyses partly determined by data format
Visualization of data can vary
So, always know what kind of data you are dealing with!
Vector and raster data
Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019
Vector data
Representing the world in vectors
The surface of the earth is represented by simple geometries and attributes.
Each object is defined by longitude (x) and latitude (y) values.
It could also include z coordinates…
Vector data: geometries
Every real-world feature is one of three types of geometry:
Points: discrete location (e.g., a tree)
Lines: linear feature (e.g., a river)
Polygons: enclosed areas (e.g., a city, country, administrative boundaries)
National Ecological Observatory Network (NEON), cited by Datacarpentry
Vector data: attribute tables
Only geometries means that we do not have any other information.
We must assign attributes to each geometry to hold additional information \(\rightarrow\) data tables called attribute tables.
Each row represents a geometric object, which we can also call observation, feature, or case
Each column holds an attribute or, in ‘our’ language, a variable
Vector data: attribute tables
File formats/extensions
GeoPackage .gpkg
Shapefile .shp
GeoJSON .geojson
…
Sometimes, vector data come even in a text format, such as CSV
Welcome to simple features
Several packages are out there to wrangle and visualize spatial and, especially, vector data within R. We will use the sf package (“simple features”).
Why?
simple features refers to a formal standard representing spatial geometries and supports interfaces to other programming languages and GIS systems (ISO 19125-1).
The first step is, of course, loading the data. We want to import the .geojson file for the administrative borders of the whole world called World_Countries.geojson.
Simple feature collection with 251 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -89 xmax: 180 ymax: 83.6236
Geodetic CRS: WGS 84
# A tibble: 251 × 6
FID COUNTRY ISO COUNTRYAFF AFF_ISO geometry
<int> <chr> <chr> <chr> <chr> <MULTIPOLYGON [°]>
1 1 Afghanistan AF "Afghanist… "AF" (((61.27655 35.60725, 61…
2 2 Albania AL "Albania" "AL" (((19.57083 41.68527, 19…
3 3 Algeria DZ "Algeria" "DZ" (((4.603354 36.88791, 4.…
4 4 American Samoa AS "United St… "US" (((-170.7439 -14.37555, …
5 5 Andorra AD "Andorra" "AD" (((1.445836 42.60194, 1.…
6 6 Angola AO "Angola" "AO" (((23.47611 -17.62584, 2…
7 7 Anguilla AI "United Ki… "GB" (((-63.16778 18.16445, -…
8 8 Antarctica AQ " " " " (((-180 -84.30535, -179.…
9 9 Antigua and Barbuda AG "Antigua a… "AG" (((-61.73806 16.98972, -…
10 10 Argentina AR "Argentina" "AR" (((-71.85916 -41.01128, …
# ℹ 241 more rows
We can already plot it
plot(hello_world["FID"])
This is the bounding box
Inspect your data: classics
There are no huge differences between the file we just imported and a regular data table.
# head of data tablehead(hello_world, 2)
Simple feature collection with 2 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 19.28854 ymin: 29.40611 xmax: 74.91574 ymax: 42.66035
Geodetic CRS: WGS 84
# A tibble: 2 × 6
FID COUNTRY ISO COUNTRYAFF AFF_ISO geometry
<int> <chr> <chr> <chr> <chr> <MULTIPOLYGON [°]>
1 1 Afghanistan AF Afghanistan AF (((61.27655 35.60725, 61.29638 35…
2 2 Albania AL Albania AL (((19.57083 41.68527, 19.58195 41…
Inspect your data: spatial features
Besides our general data inspection, we may also want to check the spatial features of our import. This check includes the geometric type (points? lines? polygons?) and the coordinate reference system.
# type of geometrysf::st_geometry(hello_world)
Geometry set for 251 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -89 xmax: 180 ymax: 83.6236
Geodetic CRS: WGS 84
First 5 geometries:
Inspect your data: spatial features
Each polygon is defined by several connected points to build an enclosed area. Several polygons in one data frame have the sf type multipolygons. Just as the world consists of several states, the polygon of the world consists of several smaller polygons.
# extract the simple features column and further inspecting attr(hello_world, "sf_column") |> dplyr::pull(hello_world, var = _) |> dplyr::glimpse()
sfc_MULTIPOLYGON of length 251; first list element: List of 1
$ :List of 1
..$ : num [1:708, 1:2] 61.3 61.3 61.4 61.4 61.4 ...
- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
Inspect your data: spatial features
Remember: The Coordinate Reference System is critical. A crucial step is to check the CRS of your geospatial data.
Often we want to combine geospatial datasets from different sources. The choice of an appropriate method depends, once again, on the data format. In the world of vector data, we usually speak of “spatial joins” when linking two vector datasets. Let’s pull in more spatially narrowed data.
It would be nice, if we had regional identifiers in our points data but there aren’t any. This issue shows where spatial joins come in handy. We can use sf::st_join() to extract this information.
class : SpatRaster
dimensions : 4, 4, 1 (nrow, ncol, nlyr)
resolution : 1, 1 (x, y)
extent : 0, 4, 0, 4 (xmin, xmax, ymin, ymax)
coord. ref. :
source(s) : memory
name : lyr.1
min value : 1
max value : 95
We can already plot it
terra::plot(raster_layer)
File formats/extensions
GeoTIFF tif
Gridded data .grd
Network common data format .nc
Esri grid .asc
…
Sometimes, raster data come even in a text format, such as CSV
Implementations in R
terra is the most commonly used package for raster data in R.
Some other developments, e.g., in the stars package, also implement an interface to simple features in sf. We will work with stars later in this course.
The terra package helps to employ zonal statistics. But the spatstat package is even more elaborated.
Loading raster tiffs (raw WorldPop data)
Loading raster data in terra is straightforward. For this purpose, we use the function terra::rast().
class : SpatRaster
dimensions : 934, 1100, 1 (nrow, ncol, nlyr)
resolution : 0.008333333, 0.008333333 (x, y)
extent : 5.87375, 15.04042, 47.27458, 55.05792 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source(s) : memory
varname : deu_ppp_2020_1km_Aggregated
name : deu_ppp_2020_1km_Aggregated
min value : -901.1108
max value : 1609.1487
Loading raster data with stars
Later, we will work with multidimensional raster data cubes. While it is possible to work them a bit also in terra, stars is way more elaborated. Just that you already can see, how to import raster data with stars, here’s an example.
stars object with 2 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu.
deu_ppp_2020_1km_Aggregated... 0.01720133 7.404476 14.82611 60.1534 32.34949
Max. NA's
deu_ppp_2020_1km_Aggregated... 2277.353 83687
dimension(s):
from to offset delta refsys point x/y
x 1 1100 5.874 0.008333 WGS 84 FALSE [x]
y 1 934 55.06 -0.008333 WGS 84 FALSE [y]
stars objects can also be plotted
plot(pop_grid_ger_2020_stars)
Finally: Transforming the CRS
As mentioned before, we may have to change the CRS also of our raster data. Going back to terra, this is not called ‘transforming’ but ‘projecting’ (which makes sense, right?).